Homework 1, Parallelized Dynamics

Marcus Abate

September 28th, 2020

Due October 1st, 2020 at midnight EST.

Problem 1: A Ton of New Facts on Newton

In this problem we will look into Newton's method. Newton's method is the dynamical system defined by the update process:

$$x_{n+1} = x_n - \left(\frac{dg}{dx}(x_n)\right)^{-1} g(x_n)$$

For these problems, assume that $\frac{dg}{dx}$ is non-singular.

Part 1

Show that if $x^\ast$ is a steady state of the equation, then $g(x^\ast) = 0$.

Because $x^*$ is assumed to be the steady-state of the system, we know that

$$ x^*_{n+1} - x^*_n = 0 $$

or, equivalently that the system is stable near $x^*$ and does not change. Therefore the latter term must be zero:

$$ \left( \frac{dg}{dx}(x_n) \right)^{-1} g(x_n) = 0 $$

Because the derivative term $\frac{dg}{dx}$ is non-singular and therefore invertible, $\frac{dg}{dx}(x^*)$ cannot be 0. Therefore, $g(x*) = 0$ must be true.

Part 2

Take a look at the Quasi-Newton approximation:

$$x_{n+1} = x_n - \left(\frac{dg}{dx}(x_0)\right)^{-1} g(x_n)$$

for some fixed $x_0$. Derive the stability of the Quasi-Newton approximation in the form of a matrix whose eigenvalues need to be constrained. Use this to argue that if $x_0$ is sufficiently close to $x^\ast$ then the steady state is a stable (attracting) steady state.

We will assume that $\frac{dg}{dx}$ is continuous. $x_n$ is a vector $x_n \in R^k$. Because $x_0$ is fixed, we have the following:

$$ \frac{dg}{dx}(x_0) = A $$

Where $A$ is a constant (precomputable) $R^{k \times k}$ matrix. Our update step looks like this:

$$ x_{n+1} = x_n - A^{-1}g(x_n) $$

This is a dynamical system, similar to the multivariable case described in the lecture notes. $g(x_n)$ is a discrete mapping, and this system is diagonalizable with some matrix $B$ that is related to $A^{-1}g(x_n)$ We know that if all of the eigenvalues of this system $B$ are in the unit circle then $x_n \to 0$.

For some $x^*$ steady state, if we want the state to be stable then the system must be a contraction mapping around the linearization point. Because we know that $x^*$ is a steady-state and that the system converges to 0 near it (assuming all the eigenvalues of the linearized system are in the unit circle), we can say

$$ x_{n+1} = x^* - A^{-1}g(x^*) \to 0 $$$$ A^{-1}g(x^*) \approx x^* $$

This implies that there exists a neighborhood around $x^*$ with convergence to the steady state. If $x_0$ is in this neighborhood, then the steady-state $x^*$ is stable.

Part 3

Relaxed Quasi-Newton is the method:

$$x_{n+1} = x_n - \alpha \left(\frac{dg}{dx}(x_0)\right)^{-1} g(x_n)$$

Argue that for some sufficiently small $\alpha$ that the Quasi-Newton iterations will be stable if the eigenvalues of $(\left(\frac{dg}{dx}(x_0)\right)^{-1} g(x_n))^\prime$ are all positive for every $x$.

(Technically, these assumptions can be greatly relaxed, but weird cases arise. When $x \in \mathbb{C}$, this holds except on some set of Lebesgue measure zero. Feel free to explore this.)

As in the previous part, for stability we require the eigenvalues to be in the unit circle. The given matrix

$$ (\left(\frac{dg}{dx}(x_0)\right)^{-1} g(x_n))' $$

Is the Hessian of the system, or the second derivative. If it has all positive eigenvalues then it can be assumed to be postiive definite (because we're assuming continuous (partial) derivatives, including the second). A positive definite Hessian at $x_0$ is indicative of a local minimum near $x_0$.

The $\alpha$ term can perturb the signal, but since this is a local minimum a small enough $\alpha$ will mean a stable system near the linearization point.

Part 4

Fixed point iteration is the dynamical system

$$x_{n+1} = g(x_n)$$

which converges to $g(x)=x$.

  1. What is a small change to the dynamical system that could be done such that $g(x)=0$ is the steady state?
  2. How can you change the $\left(\frac{dg}{dx}(x_0)\right)^{-1}$ term from the Quasi-Newton iteration to get a method equivalent to fixed point iteration? What does this imply about the difference in stability between Quasi-Newton and fixed point iteration if $\frac{dg}{dx}$ has large eigenvalues?

To make $g(x) = 0$ the steady state, we simply add an $x$ term:

$$ x_{n+1} = x_n - g(x_n) $$

Therefore, when $g(x) = 0$ the system converges to steady-state. This is identical to the Quasi-Newton method we've worked with in previous sections.

If we set the following:

$$ \left( \frac{dg}{dx}(x_0) \right)^{-1} = I_3 = 1 $$

Then we get the same result between Quasi-Newton and fixed-point iteration. The eigenvalues of this "Jacobian" are all 1, so the fixed-point iteration loses the stability guarantees of Newton's method. However, you don't have to evaluate the Jacobian. As this is a linear-time operation, using fixed-point gives greater speed despite worse stability.

It's important to note that from this it is clear that Newton's method is a special case of fixed-point iteration. the Newton update step can be considered the system $g(x_n)$ in the fixed-point system. This means that all properties of fixed-point systems are true of Newton's method as well.

Problem 2: The Root of all Problems

In this problem we will practice writing fast and type-generic Julia code by producing an algorithm that will compute the quantile of any probability distribution.

Part 1

Many problems can be interpreted as a rootfinding problem. For example, let's take a look at a problem in statistics. Let $X$ be a random variable with a continuous distribution function (CDF) of $cdf(x)$. Recall that the CDF is a monotonically increasing function in $[0,1]$ which is the total probability of $X < x$. The $y$th quantile of $X$ is the value $x$ at with $X$ has a y% chance of being less than $x$. Interpret the problem of computing an arbitrary quantile $y$ as a rootfinding problem, and use Newton's method to write an algorithm for computing $x$.

(Hint: Recall that $cdf^{\prime}(x) = pdf(x)$, the probability distribution function.)

For a given continuous distribution function $cdf(x)$, we can find the root of the distribution that yields the quantile $y$ using Newton's Method. The algorithm uses the usual Newton update step:

$$ x_{n+1} = x_n - \frac{f(x)}{f'(x)} $$

Where $f(x) = cdf(x)$ and $f'(x) = cdf'(x) = pdf(x)$, or the probability distribution function.

"""my_quantile pseudocode
   computes the quantile y of continuous distribution function d
   @param y: percentile
   @param d: Univariate Distribution
"""
function my_quantile(y,d):
    x = 0
    x_m1 = x
    while(true):
        f = cdf(d, x) - y
        fprime = pdf(d, x)

        x_m1 = x
        x = x - f/fprime

        if abs(x - x_m1) < 1e-6:
            break
    return x

Part 2

Use the types from Distributions.jl to write a function my_quantile(y,d) which uses multiple dispatch to compute the $y$th quantile for any UnivariateDistribution d from Distributions.jl. Test your function on Gamma(5, 1), Normal(0, 1), and Beta(2, 4) against the Distributions.quantile function built into the library.

(Hint: Have a keyword argument for $x_0$, and let its default be the mean or median of the distribution.)

Problem 3: Bifurcating Data for Parallelism

In this problem we will write code for efficient generation of the bifurcation diagram of the logistic equation.

Part 1

The logistic equation is the dynamical system given by the update relation:

$$x_{n+1} = rx_n (1-x_n)$$

where $r$ is some parameter. Write a function which iterates the equation from $x_0 = 0.25$ enough times to be sufficiently close to its long-term behavior (400 iterations) and samples 150 points from the steady state attractor (i.e. output iterations 401:550) as a function of $r$, and mutates some vector as a solution, i.e. calc_attractor!(out,f,p,num_attract=150;warmup=400).

Test your function with $r = 2.9$. Double check that your function computes the correct result by calculating the analytical steady state value.

For our purposes, input parameter $f$ will be the update function $f(x) = x(1-x)$. $p$ will represent the parameters. In the univariate case, this is $p = r$.

There is generally no closed-form solution to the logistic difference equation. From this source we have the result

$$ x = 0.655 $$

When $r = 2.9$. We can check whether the function works:

Our result matches closely the one obtained by the source.

Part 2

The bifurcation plot shows how a steady state changes as a parameter changes. Compute the long-term result of the logistic equation at the values of r = 2.9:0.001:4, and plot the steady state values for each $r$ as an r x steady_attractor scatter plot. You should get a very bizarrely awesome picture, the bifurcation graph of the logistic equation.

(Hint: Generate a single matrix for the attractor values, and use calc_attractor! on views of columns for calculating the output, or inline the calc_attractor! computation directly onto the matrix, or even give calc_attractor! an input for what column to modify.)

Part 3

Multithread your bifurcation graph generator by performing different steady state calcuations on different threads. Does your timing improve? Why? Be careful and check to make sure you have more than 1 thread!

The timing does improve, because the parameter search is split over multiple threads. On my machine it is 4. The speedup is roughly 3.5x faster, close to the theoretical limit of 4x given 4 threads.

Part 4

Multiprocess your bifurcation graph generator first by using pmap, and then by using @distributed. Does your timing improve? Why? Be careful to add processes before doing the distributed call.

(Note: You may need to change your implementation around to be allocating differently in order for it to be compatible with multiprocessing!)

Timing does not improve relative to the multithreaded case. @distributed is near the same order of magnitude as the serial case, likely because it is a static scheduler. pmap is far worse, as it is a dynamic scheduler with additional overhead. Neither is better than the multithreaded case because the additional overhead required to do multiprocessing is unjustified given the type/size of the problem.

Part 5

Which method is the fastest? Why?

Multithreading is the fastest because the problem's size isn't conducive to multiprocessing, and because parameter searching is well-suited for splitting over multiple threads in a threadsafe way.